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Abstract. We investigate the problem of stable in-place merging from 
a ratio k = 7 based point of view where m,n are the sizes of the input 
sequences with m < n . We introduce a novel algorithm for this problem 
that is asymptotically optimal regarding the number of assignments as 
well as comparisons. Our algorithm uses knowledge about the ratio of 
the input sizes to gain optimality and does not stay in the tradition of 
Mannila and Ukkonen’s work [8] in contrast to all other stable in-place 
merging algorithms proposed so far. It has a simple modular structure 
and does not demand the additional extraction of a movement imitation 
buffer as needed by its competitors. For its core components we give 
concrete implementations in form of Pseudo Code. Using benchmarking 
we prove that our algorithm performs almost always better than its direct 
competitor proposed in [6]. 

As additional sub-result we show that stable in-place merging is a quite 
simple problem for every ratio k > ym by proving that there exists a 
primitive algorithm that is asymptotically optimal for such ratios. 


1 Introduction 


Merging denotes the operation of rearranging the elements of two adjacent sorted 
sequences of size m and n, so that the result forms one sorted sequence of m+n 
elements. An algorithm merges two sequences in place when it relies on a fixed 
amount of extra space. It is regarded as stable, if it preserves the initial ordering 
of elements with equal value. 

There are two significant lower bounds for merging. The lower bound for the 
number of assignments is m + n because every element of the input sequences 
can change its position in the sorted output. As shown e.g. in Knuth [7] the 
lower bound for the number of comparisons is 2Q(mlog(4 + 1)), where m < n. 
A merging algorithm is called asymptotically fully optimal if it is asymptotically 
optimal regarding the number of comparisons as well as assignments. 

We will inspect the merging problem on the foundation of a ratio based approach. 
In the following k will always denote the ratio k = * of the sizes of the input 
sequences. The lower bounds for merging can be expressed on the foundation of 
such a ratio as well. We get 2(mlog(k + 1)) as lower bound for the number of 
comparisons and m - (k + 1) as lower bound for the number of assignments. 


In the first part of this paper we will show that there is a simple asymptot- 
ically fully optimal stable in-place merging algorithm for every ratio k > ym. 
Afterward we will introduce a novel stable in-place merging algorithm that is 
asymptotically fully optimal for any ratio k. The new algorithm has a modular 
structure and does not rely on the techniques described by Mannila and Ukko- 
nen [8] in contrast to all other works ([10,4,2,6]) known to us. Instead it exploits 
knowledge about the ratio of the input sizes to achieve optimality. In its core 
our algorithm consists of two separated operations named “Block rearrangement” 
and “Local merges”. The separation allowed the omitting of the extraction of an 
additional movement imitation buffer as e.g. necessary in [6]. For core parts of 
the new algorithm we will give an implementation in Pseudo-Code. Some bench- 
marks will show that it performs better than its competitor proposed in [6] for 
a wide range of inputs. 

A first conceptual description of a stable asymptotically fully optimal in- 
place merging algorithm can be found in the work of Symvonis [10]. Further work 
was done by Geffert et al. [4] and Chen [2] where Chen presented a simplified 
variant of Geffert et al’s algorithm. All three publications delivered neither an 
implementation in Pseudo-Code nor benchmarks. Recently Kim and Kutzner 
[6] published a further algorithm together with benchmarks. These benchmarks 
proved that stable asymptotically fully optimal in-place merging algorithms are 
competitive and don’t have to be viewed as theoretical models merely. 


2 A simple asymptotically optimal algorithm for k > ym 


Algorithm Arguments Comparisons Assignments 
Hwang and Lin u,v with |u| < |v] m(t +1) +n/% 
let where 

m = jul ,n = [o| t = |log(n/m)] 
(1) - ext. buffer 2m+n 
(2) - m rotat. n+m +m 
Block Swapping u,v with |u| = |v| - 3 Jul 
Block Rotation u,v - Jul + [v| + ged(Jul , lvl) 

< 2(lul + lol) 

Binary Search |u, x (searched element)| [log |u| | +1 - 
Minimum Search u Ju] — 1 - 
Insertion Sort u, let m = |u| a +(m-—1) meal 


Table 1. Complexity of the Toolbox-Algorithms 


We now introduce some notations that will be used throughout the paper. 
Let u and v be two ascending sorted sequences. We define u < v (u < v) iff 
x < y (x < y) for all elements x € u and for all elements y € v. |u| denotes the 
size of the sequence u. Unless stated otherwise, m and n (m < n) are the sizes 


of two input sequences u and v respectively. 6 always denotes some block-size 
with ô < m. 

Tab. 1 contains the complexity regarding comparisons and assignments for six 
elementary algorithms that we will use throughout this paper. Brief descriptions 
of these algorithms except for “Minimum Search” can be found in [6]. In the case 
of “Minimum Search” we assume that u is unsorted, therefore a linear search is 
necessary. 

First we will now show that there is a simple stable merging algorithm called 
BLockK-ROTATION-MERGE that is asymptotically fully optimal for any ratio 
k > ym. Afterward we will prove that there is a relation between the number of 
different elements in the shorter input sequence u and the number of assignments 
performed by the rotation based variant of Hwang and Lin’s algorithm [5]. 

Algorithm 1: BLOCK-ROTATION-MERGE (uw, v, 0) 


1. We split the sequence u into blocks ujyu2...upm7 so that all sections us to 


ô 
Ur my are of equal size 6 and u1 is of size m mod ô. Let x; be the last element 
ò 


of u; (i = 1,---, [2]. Using binary searches we compute a splitting of v 
into sections V102... UP my SO that v; < xi <vuaiG@=1,---, [=] —1). 
6 


2. uuz... ur 10102 ih vje] is reorganized to 


EJ 
UV {UQU2 .. Ulm] UE my using =] — 1 many rotations. 


3. We locally merge all pairs wv; using =] calls of the rotation based variant 


of Hwang and Lin’s algorithm ([5]). 


The steps 2 and 3 are interlaced as follows: After creating a new pair u,v; (i = 
l,e, 2] as part of the second step we immediately locally merge this pair as 
described in step 3. 


Lemma 1. BLOCK-ROTATION-MERGE performs u + 2n+6m+m-6d many 
assignments at most if we use the optimal algorithm from Dudzinski and Dydek 


[8] for all block-rotations . 


Proof. For the first rotation from wuyu2-+-Upm U1 to u1viug: th the al- 


3 7 
gorithm performs |u| +--+- + up my | + [vi] + ged(|ua| +--+ + lufe] |, Jur|) as- 
signments. The second rotation from u2ug +--+ Up m7 v2 to Ugv2QU3° °° Ulm requires 
ô ô 
Jju3| +4 lufa] Hlva|+gcd(Juz|+-- + [Up my |, |v2|) assignments, and so on. For 


the last rotation from uU] saga i he] to lal hae infet] 
the algorithm requires KES |+ lore] + gcd(luf a] |, pe y-al) assignments. 
Additionally 4 (36 + 36 +6?) = 6m +m - ô assignments are required for the local 
merges. Altogether the algorithm performs ô- ((4 —1)+ (4 —2)+--+1)+ 


ntn+6m+m-6= %—M42-n+6m+m-6< m 2.n+6m+m.ô 
assignments at most. 


Lemma 2. BLOCK-ROTATION-MERGE is asymptotically optimal regarding the 
number of comparisons. 


Corollary 1. If we assume a block-size of |,/m| then BLOCK-ROTATION- MERGE 
is asymptotically fully optimal for all k > ym. 


So, for k > ym there is a quite primitive asymptotically fully optimal stable 
in-place merging algorithm. In the context of complexity deliberations in the 
next section we will rely on the following Lemma. 


Lemma 3. Let à be the number of different elements in u. Then the number of 
assignments performed by the rotation based variant of Hwang and Lin’s algo- 


rithm is O(A-m+n) = O((A+k) +m). 


Proof. Let u = uug... u), where every u(i = 1,---,A) is a maximally sized 
section of equal elements. We split v into sections vjv2...U,Vx+1 so that we get 
vi < ui < Vig (i = 1,---,A). (Some vi can be empty.) We assume that Hwang 
and Lin’s algorithm already merged a couple of section and comes to the first 
elements of the section u;(i = 1,---, A). The algorithm now computes the section 
v; and moves it in front of u; using one rotation of the form ---u;...u,u;--- to 
++ U;Uz...Ux+++. This requires |u;|+---+]ua]+]o;|+gcd(Jus|+---+]ual, loi) < 
2(m-+|v;|) many assignments. Afterward the algorithm continues with the second 
element in u;. Obviously there is nothing to move at this stage because all 
elements in u; are equal and the smaller elements from v were already moved 
in the step before. Because we have only A different sections we proved our 
conjecture. 


Corollary 2. Hwang and Lin’s algorithm is fully asymptotically optimal if we 
have either k > m or k > A where X is the number of different elements in the 
shorter input sequence u . 


3 Novel asymptotically optimal stable in-place merging 
algorithm 


We will now propose a novel stable in-place merging algorithm called STABLE- 
OPTIMAL-BLOCK- MERGE that is fully asymptotically optimal for any ratio. No- 
table properties of our algorithm are: It does not rely on the block management 
techniques described in Mannila and Ukonnen’s work [8] in contrast to all other 
such algorithms proposed so far. It degenerates to the simple BLOCK-ROTATION- 
MERGE algorithm for roughly k > \/m/2 . The internal buffer for local merges 
and the movement imitation buffer share a common buffer area. The two opera- 
tions “block rearrangement” and “local merges” stay separated and communicate 
using a common block distribution storage. There is no lower bound regarding 
the size of the shorter input sequence. 

Algorithm 2: STABLE-OPTIMAL-BLOCK-MERGE 

Step 1: Block distribution storage assignment 
Let 6 = |,/m|] be our block-size. We split the input sequence u into u = sitsgu’ 
so that sı and s2 are two sequences of size |m/d| + |n/ô] and t is a sequence 
of maximal size with elements equal to the last element of sı. We assume that 


there are enough elements to get a nonempty wu’ and call sı together with s2 our 


block distribution storage (in the following shortened to bd-storage). 


Step 2: Buffer extraction 


elements originating from u 
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l elements in this area are distinct) 


Fig. 1. Segmentation after the buffer extraction 


In front of the remaining sequence u’ we extract an ascending sorted buffer b of 
size ô so that all pairs of elements inside b are distinct. Simple techniques how to 
do so are proposed e.g. in [9] or [6]. Once more we assume that there are enough 
elements to do so. Now let w be the remaining right part of u’ after the buffer 


extraction. 


The segmentation of our input sequences after the buffer extraction is shown in 


Fig. 1. 


Step 3: Block rearrangement 


buffer area (here used as movement imitation buffer) 


"la [PL 


gja 


w-block 


v-block 


Yv v 
first and second section of the block distribution storage 


Fig. 2. Graphical remarks to the block rearrangement process 


We logically split the sequence wv into blocks of equal size 6 as shown in Fig. 
2 (a). The two blocks wı and Vja J41 are undersized and can even be empty. In 


the following we call every block originating from w a w-block and every block 


originating from v a v-block. The minimal w-block of a sequence of w-blocks 
is always the w-block with the lowest order (smallest elements) regarding the 
original order of these blocks. 

We rearrange all blocks except of the two undersized blocks wı and Vajin 
so that the following 3 properties hold: 

(1) If a v-block is followed by a w-block, then the the last element of the v-block 
must be smaller than the first element of the w block (Fig. 2(b)). 

(2) If a w-block is followed by a v-block, then the first element of the w-block 
must be smaller or equal to the last element of the v-block (Fig. 2(b)). 

(3) The relative order of the v-blocks as well as w-blocks stays unchanged. 

This rearrangement can be easily realized by “rolling” the w-blocks through 
the v-blocks and “drop” minimal w-blocks so that the above properties are ful- 
filled. During this rolling the w-blocks stay together as group but they can be 
moved out of order. So, due to the need for stability, we have to track their po- 
sitions. For this reason we mirror all block replacements in the buffer area using 
a technique called movement imitation (The technique of movement imitation is 
described e.g. in [10] and [6]). Every time when a minimal w-block was dropped 
we can find the position of the next minimal block using this buffer area. 

Later we will have to find the positions of w-blocks in the block-sequence created 
as output of the rearrangement process. For this purpose we store the positions 
of w-blocks in the block distribution storage as follows: 

The block distribution storage consist of two sections of size |m/d| + |n/ô] and 
the i-th element of the first section together with the i-th element of the second 
section belong to the i-th block in the result of the rearrangement process. Please 
note that, due to the technique used for constructing the bd-storage, such pairs 
of elements are always different with the first one smaller than the second one. If 
the i-th block originates from w we exchange the corresponding elements in the 
bd-storage otherwise we leave them untouched. Fig. 2(c) shows this graphically. 

Step 4: Local merges 
We visit every w-block and proceed as follows: 

Let p be the w-block to be merged and let q be the sequence of all v-originating 
elements immediately to the right of p that are still unmerged. Further let x be 
the first element of p. 

(1) Using a binary search we split q into q = q1q2 so that we get qı < x < q2. 
It holds |qi| < 6 due to the block rearrangement applied before. (2) We rotate 
pqig2 to qıpq2. (3) We locally merge p and q2 by Hwang and Lin’s algorithm, 
where we use the buffer area as internal buffer. 

This visiting process starts with the rightmost w-block and moves sequen- 
tially w-block by w-block to the left. The positions of the w-blocks are detected 
using the information hold in the bd-storage. Every time when we locate the 
position of a w-block in the bd-storage we bring the corresponding bd-storage 
elements back to their original order. So, after finishing all local merges both 
sections of the bd-storage are restored to their original form. 

Step 5: Final sweeping up 
On the left there is a still unmerged sub-sequence s;ts2bw v’ where v’ is the 


subsection of v that consists of the remaining unmerged elements. We proceed 
as follows: (1) We split v’ into v’ = viv}, BO that vi < x < v4 where z is the last 
element of s2. Afterward we rotate bw,v}v}, to vibwvi and locally merge w, and 
vs using Hwang and Lin’s algorithm wth the internal buffer. (2) In the same 
way we split vj, into v’ = v1 1v1,2 so that we get vj, < y < v12 where y is the 
last element of sı. We rotate s,ts2v1,1U1,2 to $101,1t82U1,2 and locally merge sı 
with vi ı and s2 with v1.2’ using the BLOCK-ROTATION-MERGE algorithm with 
a block-size of |,/m]. (3) We sort the buffer area using INSERTION-SORT and 
merge it with all elements right of it using the rotation based variant of Hwang 
and Lin’s algorithm. 
Lack of Space in Step 1: 
The inputs are so asymmetric that u’ becomes empty. Using a binary search 
we split v into v = vj v2 so that we get vy < t < vg and rotate sytsquyv2 
to 5 ,v tsgv2. Using the BLOCK-ROTATION-MERGE algorithm with a block-size 
|./m| we locally merge sı with vı and s2 with v2. If s2 is empty we ignore it 
and directly merge sı with v in the same style. 
Extracted buffer smaller than |,/m| in Step 2: 

We assume that we could extract a buffer of size A with A < |,/m]|. We change 
our block-size ô to ||u|/A| and apply the algorithm as described but with the 
modification that we use the rotation based variant of Hwang and Lin’s algorithm 
for all local merges. 


Corollary 3. STABLE-OPTIMAL-BLOCK-MERGE is stable. 


Theorem 1. The STABLE-OPTIMAL-BLOCK-MERGE algorithm requires O(m+ 
n) = O(m- (k+ 1)) assignments.. 


Proof. It is enough to prove that every step is performed with O(m + n) as- 
signments. In the first step no assignments occur at all. The buffer extraction in 
step 2 requires O(m) assignments, as shown in [6]. In step 3 the “rolling” of the 
w-blocks through the v-blocks together with the “dropping” of the minimal w- 
blocks requires 3,/m:(./m+ Tm) = O(m+n) assignments. The rotations for the 


integrated “movement imitation” contribute O(./m:(,/m+ Tm?) = O(m+n) as- 
signments. The marking of the positions of the w-blocks in the bd-storage needs 
O(m) assignments. So, altogether step 3 requires O(m+n) assignments. In step 
4 each w-block rotation requires ym + ym+gecd(ym, ym) = 3\/m assignments 
at most. So all w-block rotations need 3,/m-,/m = O(m) assignments. The local 
mergings using Hwang and Lin’s algorithm consume 2m + n assignments alto- 
gether. The reconstruction of the original order of the exchanged elements in the 
bd-storage contributes O(,/m) assignments. In step 5 the first rotation requires 
4,/m assignments at most and the local merging of wı and vh needs 3\/m assign- 
ments at most. The second rotation requires 3,/m + Tr assignments at most. 
The success in step 1 implies that roughly k < ym/2, so we get k: ym < m. 
Further we have |m/6| + |n/6| is roughly equal to (k + 1) - ym = mE. So, 
according to Lemma 1 each local merging wih BLOCK-ROTATION- MERGE. ued 


iene +2-n4+6k/m+k(/m/m) < E= 4+2n+6m+k-m assignments at 


most. The buffer sorting using insertion sort contributes O(m) assignments and 
the final call of Hwang and Lin’s algorithm requires n + m + ym assignments. 
So, step 5 needs altogether O(m + n) assignments as well. 

In the first exceptional case “Lack of Space in Step 1” we have roughly k > \/m/2 
and directly switch to BLOCK-ROTATION-MERGE. According to Corollary 1 
BLOCK-ROTATION-MERGE is fully asymptotically optimal for such k. 

In the second exceptional case “Extracted buffer smaller than |./m|” we change 
the block-size to ||u|/A| with A < ym and use the rotation based variant of 
Hwang and Lin’s algorithm for local merges. A recalculation of the steps 3 to 5, 
were we use Lemma 3 in the context of all local merges, proves that the number 
of assignments is still O(m + n) . 


Lemma 4. If k =X; ki for any k; > 0 and integer n > 0, then X`; log ki < 
nlog(k/n). 


Proof. It holds because the function log x is concave. 


Theorem 2. The STABLE-OPTIMAL-BLOCK-MERGE algorithm requires 
O(mlog( + 1)) = O(mlog(k + 1)) comparisons. 


Proof. As in the case of the assignments it is enough to show that every step 
keeps the asymptotic optimality. Step 1 contains one binary search over m 
merely. The buffer extraction in step 2 requires m comparisons at most, as shown 
in [6]. The rearrangement of all blocks except of the two undersized blocks wy 
and CESES in step 3 requires 2\/m+ Tai comparisons at most. The detection of 
the minimal element in the movement imitation buffer demands ym: ym many 
comparisons at most. In step 4 the binary searches for splitting the q-sequences 
cost \/m-log ym comparisons at most. Now let (m1, n1), (me, n2), ++, (Mr, Nr) 
be the sizes of all r-groups that are locally merged by Hwang and Lin’s algo- 
rithm. According to Lemma 4, Table 1 and since r < ,/m this task requires 
Dial, (log( Fe) + 1) + mi) = Xi (mi log( Ft) + 2mi) < Xi mi log( Ft) + 
2m = Ð; (mj log ni; — m;logm;) + 2m = /m X; (logn; — log m;) + 2m < 
Vm(/mlog ® — /mlog ™) + 2m < m(log( + 1)) + 2m = O(mlog(% + 1)) 
comparisons. The asymptotic optimality in step 5 as well as in the exceptional 
case “Lack of Space in Step 1” is obvious due to Lemma 2. The change of the 
block-size in the second exceptional case “Extracted buffer smaller than |,/m|” 
triggers a simple recalculation of step 3 and step 4, where we leave the details 
to the reader. 


Corollary 4. STABLE-OPTIMAL-BLOCK-MERGE is an asymptotically fully op- 
timal stable in-place merging algorithm. 


Pseudo-code implementations for the core operations “block rearrangement” and 
“local merges” are given in Alg. 1 and Alg. 2, respectively. Both code segments 
contain calls of the toolbox algorithms mentioned in section 2. The Pseudo-code 
definitions for these toolbox algorithms are summarized in Tab. 2. 


Pseudo-code Definition Description of the Arguments 


Hwanc-Anp-Lin(A, first, first2, last)|u is in A[firstl : first2 — 1], 
v is in A[first2 : last — 1] 


BINARY-SEARCH(A, first, last, x) delivers the position of the 
first occurrence of x in Af first : last—1] 
Minimum (A, pos1, pos2) delivers the index of the minimal element 
in A[pos1 : pos2 — 1] 
BLock-SwaP(A, pos1, pos2, len) u is in Alpos1 : pos1 + len — 1], 


v is in A[pos2 : pos2 + len — 1] 

BLock-RotTaTE(A, first, first2, last) |u,v as in HWANG-AND-LIN 

EXCHANGE(A, pos1, pos2) is equal to BLocK-SwaP(A, pos1, pos2, 1). 
Table 2. Pseudo-code Definitions of the Toolbox Algorithms 


Algorithm 1 Pseudo-code of the procedure for the block rearrangement 


REARRANGE-Biocks(A, firstl, first2, last, buf, bds1, bds2, block Size) 
1 D> we...we isin Alfirstl: first2 — 1], v1... vy—1 is in A[first2 : last — 1] 
> buffer b is in A[buf : buf + [vm] — 1] 
> bd-storage s¢{1j2} is in A[bds{1|2} : bds{1|2} + [vm] + [n/m] — 1] 


minBlock — first1 
while firstl < first2 


2 

3 

4 

5 bufEnd — buf + (first2 — first1) / blockSize 

6 

7 

8 do if first2 + blockSize < last and A[first2 + blockSize — 1] < A[minBlock] 


9 then BLock-SwaP(A, firstl, first2, blockSize) 
10 BLock-RotTaTIon(A, buf, buf + 1, buf End) 
11 if minBlock = first1 
12 then minBlock — first2 
13 first2 — first2 + blockSize 
14 else BLock-SwaP(A, minBlock, first1, blockSize) 
15 EXCHANGE(A, buf, buf + (minBlock — first1) / blockSize) 
16 EXCHANGE(A, bds1, bds2) 
17 buf — buf +1 
18 if buf < end 
19 then minIndex — MinimuM(A, buf, buf End) 
20 minBlock — firstl + (minIndex — buf) x blockSize 
21 bds1 — bds1 +1; bds2 — bds2 + 1 
22 first1 = first1 + blockSize 


3.1 Optimizations 


We now report about several optimizations that help improving the performance 
of the algorithm without any impact on its asymptotic properties. The immediate 
mirroring of all w-block movements in the movement imitation buffer (occurs in 
Step 3) triggers a rotation (line 10 in Alg. 1) every time when a v-block is moved 


Algorithm 2 Pseudo-code of the function for local merges 


LocaL-MErGES(A, first, last, buf, bds1, bds2, blockSize, numBlocks) 
1 Dœ Alfirst : last — 1] contains all blocks in distributed form 


3 index — ((last — first) / blockSize) — 1 
4 while numBlocks > 0 


5 do while A[bsd1 + index] < Albsd2 + index] 
6 do index — index — 1 
7 first2 — first + ((index + 1) x blockSize) 
8 if first2 < last 
9 then b — BINARY-SEARCH(first2, last, A[ first2 — blockSize]) 
10 BLock-RotTaTION(A, first2 — blockSize, first2, b) 
11 Hwanc-Lin(A, b — block Size, b, last, buf) 
12 last — b — block Size 
13 EXCHANGE(A, bds1 + index, bds2 + index) 
14 numBlocks — numBlocks — 1; index — index — 1 


15 return last 


into front of the group of w-blocks. The number of necessary rotations can be 
reduced by first counting the number of v-blocks moved into front of the w- 
blocks. This counting follows a single update of the movement imitation buffer 
if the placement of a minimal w-block happens. In the context of the movement 
of v-blocks into front of w-blocks (Step 3) the floating hole technique (for a 
description see [4] or [6]) can be applied for reducing the number of assignments. 
Similarly the floating hole technique can also be applied during the local merges 
(Step 4) by combining the block swap to the internal buffer with the rotation 
that moves smaller v-originating elements to the front of the w-block. In the 
special case “Extracted buffer smaller then | ym]|” the sorting of the buffer b in 
Step 5 is unnecessary because the buffer is already sorted after Step 3 and stays 
unchanged during Step 4. Insertion-Sort can be replace by some more efficient 
sorting algorithm. Please note that there is no need for stability in the context 
of the buffer sorting because all buffer elements are distinct. 


4 Experimental work 


We did some experimental work with our algorithm in order to get an impression 
of its performance. We compared it with the stable fully asymptotically optimal 
algorithm presented in [6] as well as the simple standard algorithm that relies 
on external space of size m. The results of our experimental work summarizes 
Tab. 3 where every line shows average values for 50 runs with different data. We 
took a standard desktop computer with 2GHz processor as hardware platform. 
All coding happened in the C programming language. For the measurement 
of the number of assignments we applied the optimal block rotation algorithm 


n | m |STABLE-O.-B.-MERGE| SOFSEM 2006 Alg. | Linear Standard Alg. 
#tcomp | #tassign| te | #comp | #assign| te | #comp | #assign | te 
|271 |2271 )5843212/37551852| 227 (5961524|49666369|335 [4194239] 8388608 | 121 
271/918 | 1500433]15866835| 100 | 1505766] 17182008] 122|2359288) 4718592 | 71 
271/915) 280611 |17350896| 87 | 280412 |12681115 68 |2129890]| 4259840 | 64 
271/21?) 43611 | 4422493 | 35 | 47330 |10512479] 53 |2100804] 4202496 | 63 
273/99 | 8057 |16350956] 133] 8589 |38150052|/202/8373039) 16778240] 251 
273) 2°) 1200 |15459824| 131| 1271 |30749720]161)8234508] 16777344 |254 
273) 23) 172 [11322991|119| 170 | 7535160 | 68 |7572307|16777232|301 
223) 9° 23 4163489 | 55 24 4163489 | 55 |4225121/16777218/261 
te : Execution time in ms, #comp : Number of comp., m,n : Lengths of inp. seq. 
Table 3. Practical comparison of various merge algorithms 


presented in [3]. Although this algorithm is optimal regarding the number of 
assignments it is quite slow in practice due to its high computational demands. 
Therefore for the time measurements we applied a block-swap based algorithm 
presented e.g. in [1] using identical data. 

Regarding the buffer extraction (Step 2) there are several alternatives. The ex- 
traction process can be started from the left end as well as from the right end 
of the input and we can choose between a binary search and linear search for 
the determination of the next element. All 4 possible combinations keep the 
asymptotic optimality. However, there is no clear “best choice” among them be- 
cause the most advantageous combination can vary depending on the structure 
of the input. In the context of the STABLE-OPTIMAL-BLOCK-MERGE algorithm 
we decided for the variant “starting from the left combined with binary search”, 
the SOFSEM 2006 algorithm already originally chose “starting from the right 
combined with linear search”. 

Except for two combinations of input sizes our new algorithm is always faster 
than its predecessor. The bad performance in the case (27!,2!°) reflects the lack 
of the implementation of the floating hole technique as mentioned in the section 
about optimizations. The application of BLOCK-ROTATION-MERGE triggers un- 
necessary rotations in the case (273,23). This can be fixed by introduction of a 
check whether k > m and a direct switch to the rotation based variant of Hwang 
and Lin’s algorithm if true. 


5 Conclusion 


We investigated the problem of stable in-place merging from a ratio based point 
of view by introducing a ratio k = *, where m,n are the sizes of the input 
sequences with m < n. We could show that there is a simple asymptotically fully 
optimal (optimal regarding the number of comparisons as well as assignments) 
stable in-place merging algorithm for any ratio k > ym. 

In the second part of this paper we introduced a novel asymptotically fully 
optimal stable in-place merging algorithm which is constructed on the founda- 
tion of deliberations regarding the ratio of the input sizes. Highlights of this 


algorithm are: It has a modular structure and does not rely on techniques de- 
scribed by Mannila and Ukkonen [8] in contrast to all its known competitors 
({10,4,6]). The tasks “block-distribution” and “local block mergings” are modular 
separated. As side effect they can share a common buffer area and the extrac- 
tion of a separated movement imitation buffer is not necessary. The algorithm 
demands no lower bound for the size of the shorter input sequence (32 elements 
in case of the alg. in [4] and 10 elements for the alg. in [6]). 

Our algorithm performs for a wide range of inputs remarkably better than its 
direct competitor presented in [6]. There is a superiority in particular for sym- 
metrically sized inputs, a fact that is of importance in the context of the Merge- 
sort algorithm. 

The number of comparisons and assignments are good measurements for the 
efficiency of merging algorithms. However, the impact of other operations as e.g. 
numerical calculations and index comparisons deserves investigation as well. As 
motivation we would like to refer to a well known effect with the optimal block- 
rotation algorithm introduced by Dudzinski and Dydek in [3]. Their algorithm 
is optimal regarding the number of assignments but has a bad performance due 
to a included computation of a greatest common divisor. For our further work 
we plan to include deliberations regarding such so far uncounted operations. 
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